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Abstract — We present a joint source-channel multiple de- 
scription (JSC-MD) framework for resource-constrained network 
communications (e.g., sensor networks), in which one or many 
deprived encoders communicate a Markov source against bit 
errors and erasure errors to many heterogeneous decoders, some 
powerful and some deprived. To keep the encoder complexity at 
minimum, the source is coded into K descriptions by a simple 
multiple description quantizer (MDQ) with neither entropy nor 
channel coding. The code diversity of MDQ and the path 
diversity of the network are exploited by decoders to correct 
transmission errors and improve coding efficiency. A key design 
objective is resource scalability: powerful nodes in the network 
can perform JSC-MD distributed estimation/decoding under the 
criteria of maximum a posteriori probability (MAP) or minimum 
mean-square error (MMSE), while primitive nodes resort to 
simpler MD decoding, all working with the same MDQ code. 
The application of JSC-MD to distributed estimation of hidden 
Markov models in a sensor network is demonstrated. 

The proposed JSC-MD MAP estimator is an algorithm of the 
longest path in a weighted directed acyclic graph, while the JSC- 
MD MMSE decoder is an extension of the well-known forward- 
backward algorithm to multiple descriptions. Both algorithms 
simultaneously exploit the source memory, the redundancy of 
the fixed-rate MDQ, and the inter-description correlations. They 
outperform the existing hard-decision MDQ decoders by large 
margins (up to 8dB). For Gaussian Markov sources, the com- 
plexity of JSC-MD distributed MAP sequence estimation can be 
made as low as that of typical single description Viterbi-type 
algorithms. 

The new JSC-MD framework also enjoys an operational 
advantage over the existing MDQ decoders. It eliminates the 
need for multiple side decoders to handle different combinations 
of the received descriptions by unifying the treatments of all these 
possible cases. 

Keywords: Multiple descriptions, distributed sequence es- 
timation, joint source-channel coding, hidden Markov model, 
forward-backward algorithm, sensor networks, complexity. 



I. Introduction 

We propose a joint source-channel multiple description 
(JSC-MD) framework for distributed communication and es- 
timation of memory sources. The JSC-MD framework is de- 
signed to suit lossy networks populated by resource-deprived 
transmitters and receivers of varied capabilities. Such a sce- 
nario is common in sensor networks and wireless networks. 
For instance, a large number of inexpensive sensors with no or 
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low maintenance are deployed to monitor, assess, and react to a 
large environment. On one hand these sensors have to conserve 
energy to ensure a long lifespan, and on the other hand they 
need to communicate with processing centers and possibly also 
among themselves in volatile and adverse network conditions. 
The energy budget and equipment level of the receivers vary 
greatly, ranging from powerful processing centers to deprived 
sensors themselves. The heterogeneity is also the norm in 
consumer-oriented wireless networks. A familiar and popular 
application is multimedia streaming with mobile devices such 
as handsets, personal data assistance (PDA), and notebook 
computers. Again battery life is a primary concern for all mo- 
bile data transmitters, while its criticality varies for receivers, 
depending on whether the receivers are cell phones, notebooks, 
base stations, etc. 

Conventional source and channel coding techniques may not 
be good choices for networks of resource-constrained nodes, 
because they make coding gains proportional to computational 
complexity (hence energy consumption). The needs for power- 
aware signal compression techniques have generated renewed 
interests in the theory of Slepian-Wolf and Wyner-Ziv coding, 
which was developed more than thirty years ago [1], [2], 
The key insight of these works is that statistically dependent 
random sources can be encoded independently without loss 
of rate-distortion performance, if the decoder has the knowl- 
edge or side information about such dependencies. Although 
originally intended for distributed source coding, the approach 
of Slepian-Wolf and Wyner-Ziv coding is of significance to 
resource-constrained compression in two aspects: 

1) communication or coordination between the encoders 
of the different sources is not necessary to achieve 
optimal compression, even if the sources are statistically 
dependent, saving the energy to communicate between 
the encoders; 

2) it is possible to shift heavy computation burdens of rate- 
distortion optimal coding of dependent sources from 
encoders to decoders. 

Such an asymmetric codec design provides an attractive signal 
compression solution in situations where a large number of 
resource-deprived and autonomous encoders need to commu- 
nicate multiple statistically dependent sources to one or more 
capable decoders, as is the case for some hierarchical sensor 
networks [3]. 

Recently, many researchers have been enthusiastically in- 
vestigating practical Wyner-Ziv video coding schemes [4], [5], 
seeking for energy-conserving solutions of video streaming on 
mobile devices. The motive is to perform video compression 
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without computationally expensive motion compensation at 
the encoder, departing from the prevailing MPEG practice. 
Instead, the decoder is responsible to exploit the interframe 
correlations to achieve coding efficiency. 

While Wyner-Ziv coding can shift computational complex- 
ity of signal compression from encoders to decoders, it does 
not address another characteristic of modern communication 
networks: uneven distribution of resources at different nodes. 
As mentioned earlier, decoders can differ greatly in power 
supply, bandwidth, computing capability, response time, and 
other constraints. What can be done if a decoder has to 
operate under severe resource constraints as well? Despite 
the information theoretical promise of Wyner-Ziv coding, the 
rate-distortion performance of distributed compression is oper- 
ationally bounded by the intrinsic complexity of the problem, 
or equivalently by the energy budget. It is well known that 
optimal rate-distortion compression in centralized form is NP- 
hard [6]. We have no reason to believe that approaching the 
Wyner-Ziv limit is computationally any easier. 

Given the conflict between energy conservation and coding 
performance, it is desirable to have a versatile signal coding 
and estimation approach whose performance can be scaled to 
available energy, which is the notion of resource scalability 
of this paper. The key design criterion is to keep the com- 
plexity of the encoders (often synonymously sensors in sensor 
networks) at minimum, while allowing a wide range of trade- 
offs between the complexity and rate-distortion performance at 
decoders. Depending on the availability of energy, bandwidth, 
CPU power, and other resources, different decoders should be 
able to reconstruct the same coded signal(s) on best effort 
basis. We emphasize that a same code stream or a same set of 
code streams (in case of multiple descriptions) of one or more 
sources is generated and transmitted for an entire network. By 
not generating different codes of a source to different decoder 
specifications, encoders save the energy needed to generate 
multiple codes. Furthermore, this will simplify and modularize 
the encoder (sensor) design to reduce the manufacturing cost. 
Ideally, a resource-scalable code should not deny a decoder 
without resource constraint the possibility of approaching 
the Wyner-Ziv performance limit, and at the same time it 
should allow even the least capable decoder in the network to 
reconstruct the signal, barring complete transmission failure. 

This paper will show how resource-scalable networked sig- 
nal communication and estimation can be realized by multiple 
description quantization (MDQ) at encoders and joint source- 
channel (JSC) estimation at decoders. To keep the encoder 
complexity at minimum, a source is compressed by fixed rate 
MDQ with neither entropy nor channel coding. The code 
diversity of MDQ and the path diversity of the network 
are intended to be exploited by JSC decoding to combat 
transmission errors and gain coding efficiency. Various JSC 
estimation techniques will be introduced to provide solutions 
of different complexities and performances, ranging from the 
fast and simple hard-decision decoder to sophisticated graph 
theoretical decoders. 

When used for MDQ decoding, the proposed JSC-MD 
approach has an added operational advantage over the current 
MDQ design. It generates an output sequence (the most 
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Fig. 1. Block diagram of a MDQ based communication system with a JSC- 
MD decoder. 



probable one given the source and channel statistics) consisting 
entirely of the codewords of the central quantizer, rather than 
a mixture of codewords of the central and K side decoders. As 
such the JSC-MD approach offers a side benefit of unifying 
the treatment of the 2 A cases for different subsets of received 
descriptions. Instead of employing 2 A — 1 decoders as required 
by the existing MDQ decoding process, we need only one 
MDQ decoder. This overcomes a great operational difficulty 
currently associated with the MDQ decoding process. 

The presentation flow of this paper is as follows. Section ITU 
formulates the JSC-MD problem. Section [Til] constructs a 
weighted directed acyclic graph to model the JSC-MD MAP 
estimation/decoding problem. This graph construction converts 
distributed MAP estimation into a problem of longest path in 
the graph, which is polynomially solvable. The complexity 
results ar derived. Section [IV] applies the proposed JSC-MD 
approach to distributed MAP estimation of hidden Markov 
state sequences in lossy networks. This problem is motivated 
by sensor networks of heterogeneous nodes with resource 
scalability requirements. With the same MD code transmitted 
over the entire network, the enpowered MD decoders can 
obtain exact MAP solution using a graph theoretical algorithm, 
while deprived MD decoders can obtain approximate solutions 
using algorithms of various complexities. Section [V] investi- 
gates the problem of distributed MMSE decoding of MDQ. 
It turns out that JSC-MD MMSE decoding can be performed 
by generalizing the well-known forward-backward algorithm 
to multiple descriptions. Simulation results are reported in 
Section [VI] Section [VTI1 concludes. 

II. Problem Formulation 

Fig. [T] schematically depicts the JSC-MD system motivated 
in the introduction. The input to the system is a finite Markov 
sequence — Xii X2i ' * ' iXU- A A-description MDQ 
first maps a source symbol (if multiple description scalar 
quantization (MDSQ) is used) or a block of source symbols 
(if multiple description vector quantization (MDVQ) is used) 
to a codeword of the central quantizer q : R — * C = 
{ci, C2, • • • , cl}, where L is the number of codecells of the 
central quantizer. Let the codebooks of the K side quantizers 
be Cfc = {c k ,i,c k ,2,- ■ ■ ,Ck,L k }, 1 < k< K, where L k < L 
is the number of codecells of side quantizer k, L < Yik=i Lk- 
The A"-description MDQ is specified by an index assignment 
function A& : C — ► Cfc [7]. The redundancy carried by the K 



descriptions versus the single description can be reflected by 
a rate 1 - log 2 L/EfcLi lo S2 L k [8]. 
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Due to the expediency on the part of resource-deprived 
MDQ encoders, a decoder is furnished with rich forms of 
statistical redundancy: 

• the memory of the Markov source that is unexploited by 
suboptimal source code; 

• residual source redundancy for lack of entropy coding; 

• the correlation that is intentionally introduced among the 
K descriptions of MDQ. 

The remaining question or challenge is naturally how these 
intra- and inter-description redundancies can be fully exploited 
in a distributed resource-constrained environment. 

Let x — X1X2 • ■ ■ xn € C N be the output sequence of 
produced by the central quantizer, N — M for MDSQ, or 
N = lN for MDVQ with l being the VQ dimension. The 
K descriptions of MDQ, X k (x) € Cf, 1 < k < K, are 
transmitted via K noisy diversity channels. In this work we 
use a quite general model for the K diversity channels. The 
only requirements are that these channels are memoryless, in- 
dependent, and do not introduce phase errors such as insertion 
or deletion of code symbols or bits. In the existing literature on 
MDQ, only erasure errors are considered in MDQ decoding. 
Our diversity channel model accommodates bit errors as well. 
This is an important expansion because bit errors can indeed 
happen in a received description in reality, particularly so in 
wireless network communications. Denote the received code 
streams by y k = y k sVk,2 ■ • ■ Vk,N, with y k ^ n being the n th 
codeword of description k that is observed by the decoder. 

Having the source and channel statistics and knowing the 
structure of MDQ, the decoder can perform JSC-MD decoding 
of sequences y k , 1 < k < K, to best reconstruct x. The 
JSC criterion can be maximum a posteriori probability (MAP) 
or minimum mean-square error (MMSE). For concreteness 
and clarity, we formulate the JSC-MD problem for distributed 
MAP decoding of MDQ. As we will see in subsequent sec- 
tions, the formulation for other distributed sequence estimation 
and decoding problems requires only minor modifications. In 
a departure from the current practice of designing multiple 
side decoders (up to 2 K — 1 of them!), our JSC-MD system 
offers a single unified MDQ decoder that operates the same 
way regardless what subset of the K descriptions are available 
to the decoder. For JSC decoding of single description scalar 
quantized Markov sequences, please refer to [9]— [13]. 

In JSC-MD distributed MAP decoding a decoder recon- 
structs, given the observed sequences y k , (1 < k < K, some 
of which may be empty), the input sequence x such that the 
a posteriori probability P(x\y 1 , y 2 , ■ ■ ■ tVk) k maximized. 
Namely, the MAP MDQ decoder emits 

x = &rgmaxP(x\y 1 ,y 2 ,--- ,y K ). (l) 

Comparing the proposed JSC MDQ decoder via distributed 
MAP sequence estimation with the existing symbol-by-symbol 
MDQ decoders, one sees an obvious distinction. The JSC 
decoder always generates codewords of the central quantizer 
even when it does not have all the K descriptions, while 
hard-decision MDQ decoders will output codewords of side 
quantizers. 



By Bayes' theorem we have 

P(x\y 1 ,y 2 , ■ ■ ■ ,y K ) 
_ P{x)P(y 1 ,y 2 ,--- ,y K \x) 

P{yi,V2>--- ,vk) 

i S:P(x)P(y 1 ,y 2 , ■ ■ ■ ,y K \x) 

=P(x)P(y 1 ,y 2 ,--- ,y K \Xi(x),\ 2 (x),--- ,\ K (x)) (2 ) 

K 

®P(x)l[P(y k \\ k (x)) 

fc=i 

N K 

= Y\_ \P{ x n\Xn-l) JJ-f , fe(j/fc,n|Afc(a; n ))|. 
n=l fc=l 

In the above derivation, step (a) is due to the fact that y 1 
through y K are fixed in the objective function for x 6 C N ; 
step (b) is because of the mutual independency of the K chan- 
nels; and step (c) is under the assumption that x, the output of 
the central quantizer, is first-order Markovian and the channels 
are memoryless. This assumption is a very good approximation 
if the original source sequence before MDQ is first-order 
Markovian, or a high-order Markov sequence x^ is vector 
quantized into K descriptions. 

In (f2|i we also let P(x\\xq) = P{x\) as convention. 
Pk(b\b) is the probability of receiving a codeword b = 
b\b 2 ■ ■ ■ bs from channel k as b' — b[b 2 ■ ■ ■ b' B . Because the 
channel is memoryless, we have 

B 

P k (b'\b) = l[P k (b' l \b t ). (3) 

i=l 

Specifically, if the K diversity channels can be modeled as 
memoryless error-and-erasure channels (EEC), where each bit 
is either transmitted intact, or inverted, or erased (the erasure 
can be treated as the substitution with a new symbol '$'), then 
b E {0,l} B ,b' e {0,1, $} B and 

{P<M, if b'i = $; 

(1- P4 ,. k )(l- Pc . k ), if (4) 
(1 -p<j>,k)Pc,k, otherwise 

where p^^ is the erasure probability and p c k is the inversion 
or crossover probability for channel fc, 1 < fc < K. 

In the literature MDQ is mostly advocated as a measure 
against packet erasure errors in diversity networks. Such 
packet erasure errors can be fit by the above model Pkib'^bi) 
of binary memoryless EEC, if a proper interleaver is used. 

The proposed JSC-MD framework is also suitable for ad- 
ditive white Gaussian noise (AWGN) channels. If bi is binary 
phase-shift keying (BPSK) modulated and transmitted through 
channel fc that is AWGN, then 

P k (K\h) - -^e-^fl- (5) 

where o k is the noise power spectral density of channel k. 

The prior distribution P{x) and transition probability matrix 
P(a; n |a;„_i) for the first-order Markov sequence x can be 
determined from the source distribution and the particular 
MDQ in question. 
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In the case of MDSQ, if the stationary probability density 
function of the source is p s (x) and the conditional probability 
density function is p s (Xn\Xn-i)> men 



P{x) = / Ps(x)dx 



(6) 



and 



P{x n \x„-i) 



II Xl.-?(Xl)=3n Ps(Xl\X2)Ps(X2)dX2dXl 
X2-.q(x2)=x„-l 

I x:q{x)=Xn _ 1 Ps(x)dx 



If MDVQ is the source coder of the system, the transition 
probability matrix for P(x n \x n -±)'s can be determined nu- 
merically either from a known close-form source distribution 
or from a training set. 

III. Joint Source-Channel Multiple Description 
MAP Decoding 

In this section, we devise a graph theoretical algorithm for 
JSC-MD MAP decoding algorithm. Combining (Q} and ©, 
we have 

N 



x = argmaxY^ \ logP(x n \x n - 1 )- 



K 



(8) 



k=l 



Because of the additivity of (O, we can structure the MAP 
estimation problem into the following subproblems: 



n 

)(n,x n ) = max V" \ \ogP(x i \x i _ 1 ) + 

xtC"- 1 e -~t V 
i—1 

K 

^\ogP k (y kli \\ k {Xi))}, x n eC, 1 < n < N. 



(9) 



The subproblems w(-, •) can be expressed recursively as 

w(n, x n ) 



= max i | ^2 log-P(^iki-i) + y^logPk(yk,i\^k(xi)) 
a5eC " i=i fc=i 

K 

+ logP(x n \x n -i) + log P k (y kin \\ k (x n ))\ 

k=l 

= max |w(?i - 1, c) + logP(x„|c)| 

K 

+ log Pfc ( y k , n I Afc (x n ) ) . 
fc=i 

(10) 

Then, the solution of the optimization problem (Q~|) is given 
recursively in a backward manner by 

xn — argmaxw(A r , c). 

cGC 

x n -i = argmax < w(n — 1, c) + log P(x n \c) >, 2 < n < N. 
cec ' 

(11) 



1 2 3 4 5 6 N-2 N-l N 



(6,2) 





(7) Fig. 2. Graph G constructed for the JSC-MD MAP decoding (L = 5). 



The recursion of w(n,x n ) allows us to reduce the MAP 
estimation problem to one of finding the longest path in a 
weighted directed acyclic graph (WD AG) [12], as shown in 
Fig. |2 The underlying graph G has LN + 1 vertices, which 
consists of N stages with L vertices in each stage. Each stage 
corresponds to a codeword position in x. Each vertex in a 
stage represents a possible codeword at the position. There is 
also one starting node z , corresponding to the beginning of 
x. 

In the construction of the graph G, each node is associated 
with a codeword x e C at a sequence position n, 1 < n < N, 
and hence labeled by a pair (n,x). From node (n — 1,6) to 
node (n, a), a, b £ C, there is a directed edge, whose weight 
is 

K 

log P(a\b) + ^ log P fe (y fc ,„| A fc (a)) . 
fc=i 

From the starting node s to each node (1, a), there is an edge 
whose weight is 

K 

log P(a) + l °S Pk {Vk,i | A fc (a)) . 
fc=i 

In graph G, the solution of the subproblem w(n, a) is the 
weight of the longest path from the starting node s to node 
(n, a), which can be calculated recursively using dynamic 
programming. The MAP decoding problem is then converted 
into finding the longest path in graph G from the starting 
node zo to nodes (N, c), c € C. By tracing back step by step 
to the starting node z as given in ( TTTb . the MDQ decoder 
can reconstruct the input sequence x to x, the optimal result 
defined in (Q]). 

Now we analyze the complexity of the proposed algorithm. 
The dynamic programming algorithm proceeds from the start- 
ing node z to the nodes (N,c), through all LN nodes in G. 
The value of w(n, a) can be evaluated in O(L) time, according 
to ( [TOT ). The quantities logP(a|6) and log P k (y kin \X k (a)) can 
be precomputed and stored in lookup tables so that they will 
be available to the dynamic programming algorithm in 0(1) 
time. Hence the term J\=i l°g-f > fe(2/fc,n|Afe(a)) in (TT~0b can be 
computed in 0(K) time. Therefore, the total time complexity 
of the dynamic programming algorithm is 0(L 2 NK). The 
reconstruction of the input sequence takes only O(N) time, 
given that the selections in (fTTb (and in ( fTOb as well) are 
recorded, which results in a space complexity of O(LN). 
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In [12] we proposed a monotonicity -based fast algorithm 
for the problem of MAP estimation of Markov sequences 
coded by a single description quantizer, which converts the 
longest path problem to one of matrix search. For Gaussian 
Markov sequences the matrix can be shown to be totally 
monotone, and the search can be done in lower complexity. 
The same algorithm technique can be generalized to multiple 
descriptions and reduce the complexity of JSC-MD MAP 
decoding. In the appendix we prove that distributed MAP 
decoding of If -description scalar quantizer can be completed 
in O(LNK) time for Gaussian Markov sequences. The linear 
dependency of the MAP MDSQ decoding algorithm in the 
sequence length N and source codebook size L makes it 
comparable to the complexity of typical Vertibi-type decoders 
for single description. 

IV. Distributed Multiple-Description Estimation 
of Hidden Markov Sequences 

In this section we apply the proposed JSC-MD MAP 
estimation technique to solve the problem of hidden Markov 
sequence estimation in a resource-constrained network. For 
single description hidden Markov sequence estimation is an 
extensively-studied problem with many applications [14]. As 
a case study, consider a sensor network in an inaccessible 
area to monitor the local weather system for years with no 
or little maintenance. Our objective is to remotely estimate 
the time sequence of weather patterns: sunny, rainy, cloudy 
and so on. To this end the sensors collect real-valued data 
vector: temperature, pressure, moisture, wind speed, etc., and 
communicate them to processing nodes of various means in 
the network. Some are well-equipped and easily-maintained 
processing centers, while others need to run autonomously on 
limited power supply and react to certain weather conditions 
on their own rather than being instructed by the central control. 

A. Problem Formulation 

Our task is to estimate the state sequence of a hidden 
Markov model (HMM), which is, in our example, the time 
sequence of weather patterns that are not directly observable 
by processing nodes in the sensor network. Let the state 
space of the HMM be § = {si, S2, ■ ■ • , sm}, being sunny, 
rain, and so on. For state transition from Si to sj (weather 
change) of Markov state transition probability Ps(sj\si), the 
HMM output to be observed by the sensors is a real-valued 
random vector x E K d (temperature, pressure, moisture, wind 
speed, etc.) with probability Po(x\si, Sj). The observations 
need to be communicated to data processing centers at a 
low bit rate against channel noise and losses. To maximize 
their operational lifetime the sensors have to do without 
sophisticated source coding and forgo channel coding alto- 
gether. A viable solution under such stringent conditions is 
to produce and transmit K > 2 descriptions of x in fixed 
length code without entropy coding. There are many ways for 
inexpensive and deprived encoders (sensors) to code x into 
multiple descriptions in collaboration. One possibility is the 
use of multiple description lattice vector quantizer (MDLVQ) 
[7], [15], [16]. 



Among known multiple description vector codes, MDLVQ 
is arguably the most resource-conserving with a very simple 
implementation. A iC-description MDLVQ uses a fine lattice 
in M. d as its central quantizer codebook C and an accompa- 
nying coarse lattice C s in R d as its side quantizer codebooks 
Cfc, 1 < k < K, Therefore we have Ci = C2 = • ■ • = Ca" = 
C s , and typically C s C C. An MDLVQ index assignment is 
depicted in Fig. [3] for K = 2. Each fine lattice point in C is 
labeled by a unique ordered pair of coarse lattice points in C s . 




Fig. 3. MDLVQ index assignment for A2 lattice, K = 2. Points of C and 
C s are marked by ■ and •, respectively. 

Upon observing an HMM output sequence \ , the central 
quantizer first quantizes \ N to a sequence of the nearest 
fine lattice points x = q(x N )- Then the MDLVQ encoder 
generates K description sequences of x: Xk(x) and transmits 
them through K diversity channels (or diversity paths in the 
network). 

A decoder can reconstruct Xk(x n ) to x„ with the inverse 
labeling function A -1 , if all K descriptions are received. In the 
event that only a subset of the K descriptions are received, 
the decoder reconstructs x n to the average of the received 
coarse descriptions: 

Xn = 7^7 ^2 AfcO„) (12) 
' ' fee* 

where | • | is the cardinality of a set. This is the simplest 
MDLVQ decoder possible, which is also asymptotically opti- 
mal for K = 2 [17]. 

B. Distributed MAP Sequence Estimation 

Let y k = yk.iUk,2 ■ ■ ■ Uk.N be the received sequence from 
channel k, 1 < k < K. Our task is to estimate the hidden state 
sequence z = z\z 2 ■ ■ ■ zn S § w of weather patterns, given the 
K noisy time sequences of atmosphere attributes produced by 
the HMM: y x , y 2 , y^- With the resource-scalability in 
mind, we take an approach of MAP estimation: 

z,x= argmax P(x, z\y 1 ,y 2 , ■ ■ ■ ,y K ). (m 
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Analogously to (O we use Bayes' theorem and the inde- 
pendence of the K memoryless channels to obtain 

P{x,z\y 1 ,y 2 , ■ ■ ■ ,y K ) 
(xP(z)P(x\z)P(y 1 ,y 2 , - ■ ■ ,y K \x,z) 

N 

]J {Ps( z n\Zn~l)Po{Xn\z n , Z n -l)P(yi,y 2 r ■ ' , V k\ X )} 



n=l 
N 



K 



Y[ \Ps{Zn\Zn-l)Po{Xn\Zn, Z n -l) Y[ P k (Vk,n I Afe (x„)) j . 

(14) 



We can also devise a graph theoretical algorithm to solve 
the sequence estimation problem of ( TBI . Combining ( f]~3T > and 
( TBl i and taking logarithm, we have 



AT 

z,x = argmax V" j logPs(z„|z„_i)+ 
zes N ,xeC N n=1 l 

K 

logPo( ) + log Pfc (j/fe, n I Afc (x n )) j . 

fc=l 



(15) 



Then, the MAP estimate of the sequence of hidden Markov 
states is given by 

N 

z = argmax V" j logP s (z n |z„_i) + Cn(z) \ (16) 
where 

K 

£n{z) = ma ? \ \ogPo(x\Zn,Z n -i) + S^\ogP k {y k , n \\ k {x)) \. 

fc=l 

Using the same technique as used in ( fTOb , we structure the 
above optimization problem into a nested set of subproblems: 

n 

w(n,z n )= max V" \ log P s (zi\zi-x) +&(*) }■ 

zgS „_ L J (17) 

z„6§, 1 < n < -/V 

which can be expressed recursively by 

w(n, z n ) = max { w(n-l, s)+logPs(z n \s)+£(z n , s) \ (18) 

where 

K 

£{z n ,s) = max] logP (a;|^„,s) +y]logP fc (y fcin |Afe(x))l. 

fc=l 

This recursion form also enables us to solve the sequence 
estimation problem of (fT~4T > by finding the longest path in a 
WDAG. The WDAG G contains MN + 1 vertices: a starting 
node zq and N stages with M vertices in each stage. Each 
stage corresponds to a position in time sequence z. Each vertex 
in a stage represents a possible HMM state at the position. The 
starting node z corresponds to the beginning of the sequence 
z. From node (n — l,a) to node (n,b), a,b £ S, there is a 
directed edge with weight: 

log P s (b\a)+£(b, a). 



^From the starting node zq to each node (1, o), there is an 
edge whose weight is 

K 

logPs(a) +max{ logP (x|a) + V logP k (yk,i |A fc (x)) }. 

fc=l 

In graph G, the solution of the subproblem w(n, s) is the 
weight of the longest path from the starting node zo to node 
(n, s), which can be calculated recursively using dynamic 
programming. The distributed MAP estimation problem is then 
converted into finding the longest path in graph G from the 
starting node zo to nodes (N, s),s £ S. Tracing back step by 
step to the starting node zq generates the optimally estimated 
HMM state sequence z. 

To analyze the complexity of the proposed algorithm, we 
notice that the dynamic programming algorithm proceeds 
through all MN nodes in G. The value of w(n, s) can be 
evaluated in O(M) time, according to (fT8l . The quantities 
logPg(fo|a) and log Po(x\z n , s) can be precomputed and 
stored in lookup tables so that they will be available in 
the dynamic programming process in O(l) time. The term 
£(&, a) can be computed in O(KL) time. Therefore, the total 
time complexity of this algorithm is 0(M 2 NKL). The space 
complexity is O(MN). 

C. Resource Scalability 

If a network node is not bounded by energy and com- 
puting resources, it can use the relatively expensive MAP 
algorithm that taps all available redundancies to obtain the 
best estimate of HMM state sequence, knowing the statistics 
of HMM and underlying noisy diversity channels. This JSC- 
MD framework can be used as an asymmetric codec in the 
Wyner-Ziv spirit, which stripes the encoders to the bone while 
enpowering the decoders. More importantly, it also offers a 
resource-scalability. If a node in the sensor network needs to 
estimate z but is severely limited in resources, it can still 
do so using the same MDLVQ code, albeit probably at a 
lesser estimation accuracy. The simplest hence most resource- 
conserving approximate solution is to first perform a hard- 
decision MDLVQ decoding of received descriptions yfc,n's to 
x n using (fTZt . and then estimate z n to be 



argmax P s (z) Po {x n | z). 



(19) 



The hard-decision MDLVQ decoding takes only 0(K) op- 
erations. Also, in the above approximation, we replace 

Ps(z n \z n -i) b y Ps(z n ) and Po{x n \z n , Zn-i) b y Po{in\z n ) 
in (fl~4T >. This is to minimize the resource requirement for 
estimating z by ignoring the source memory. Consequently, 
the total time complexity of the fast algorithm reduces to 
0(N(K + M)), as opposed to 0{M 2 NKL) for the full 
fledged MAP sequence estimation algorithm. The space re- 
quirement drops even more drastically to 0(M + K) from 
0{MN). 

Between the exact 0(M 2 N KL) graph theoretical algorithm 
and the least expensive 0(N(K+M)) algorithm, many trade- 
offs can be made between the resource level and performance 
of the decoder. For instance, if only Ps{zn\zn-i) is replaced 
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by Ps{z n ) in ( TBI , another possible JSC-MD estimation 
emerges: 



K 

rgmax P s (z)P ( max TT Pk(yk,n\^k{x)) 



(20) 



This leads to an 0(N(KL + M)) HMM state sequence 
estimation algorithm. The algorithm is slightly more expensive 
than the one based on ( fT~9b but offers better performance 
because the MDLVQ decoding is done with the knowledge 
of channel statistics. 



D. Application in Distributed Speech Recognition 

Given the success of HMM in speech recognition [14], we 
envision the potential use of the JSC-MD estimation technique 
for remote speech recognition. For instance, the cell phones 
transmit quantized speech signals via diversity channels to 
processing centers and the recognized texts are sent back or 
forward to other destinations. This will offer mobile users 
speech recognition functionality without requiring heavy com- 
puting power on handsets and fast draining batteries. Also, the 
network speech recognizer can prompt a user to repeat in case 
of difficulties, the user's revocalization can be used as extra 
descriptions to improve the JSC-MD estimation performance. 

V. Resource-scalable JSC-MD MMSE Decoding 

The JSC-MD distributed MAP estimation problem dis- 
cussed above is to track the discrete states of a hidden Markov 
model. Likewise, the cost function dTJ for distributed MAP 
decoding of MDQ requires the output symbols to be discrete 
codewords of the central quantizer. This may be desirable 
or even necessary, if the quantizer codewords communicated 
correspond to discrete states of semantic meanings, such 
as in some recognition and classification applications. But 
in network communication of a continuous signal = 
XiiX2) - ' ' jXaA, me JSC-MD output can be real valued. In 
this case a JSC-MD distributed MMSE decoding scheme of 
resource scalability is preferred, which is the topic of this 
section. 

The goal of the JSC-MD MMSE decoding is to reconstruct 

Xn as 

E(xn\vi;v2\ ■ ■ ■ ;vk) 

V- D / „ J xeVl XP(x)dx (21) 

= = iiviivz; ■ ■ ■ ; vk)— ( n-r- 



i=i 



where Vi is cell I of the central quantizer. Hence we need to 
estimate the a posteriori probability P(x n \y 1 ,y 2 , ■ ■ ■ ,%/k)- 
Equivalently, we estimate 



P(x n = l,yx,y 2 ,'- ■ ,Vk)i 1 e c - 



(22) 



We can solve the above estimation problem for the JSC- 
MD distributed MMSE decoding by extending the well-known 
BCJR (forward-backward) algorithm [18] to multiple observa- 
tion sequences. For notational convenience let y^ b , a < b, 



be the consecutive subsequence yk,a,Uk.a+ir ' ' iVkfi of an 
observation sequence y k . Define 

a n (l)=P(x n = l,y\~ n ,yl~ n ,--- ,y^ n ) 

f3 n (l) = P(y^ N ,y^ N , • • • , y n +^ N \x n = I) (23) 

JnQ',1) = P(x n = l,yi,n,y2,n, ' ' ' ,VK,n\x n -l = I'). 

Then we have 

P(x n = l,y 1 ,y 2 , ■ ■ ■ ,y K ) 
=P(x n = l,y{~ n ,y 1 2 ~ n ,--- y K ~ n ) 

■P(y? +1 ~ N ,y% +1 ~ N ,--- ,y n K +1 ~ N \x n = l) ( j 
=a n (l) ■ p n (l). 

The last step is due to the fact that y\~ n and y r k l+1 ~ N are 
independent given x n , and that y k and y^ are independent for 
k 7^ j. The terms a n (l) and j3 n {l) can be recursively computed 

by 

a n (l) 

L-l 

= p (xn-i = V, x n = I, y\~ n , yl~ n , • • • , y^ n ) 



l'=0 
L-l 



l'=0 



L-l 



(25) 



P{x n = l,yi t n,V2,n, ■ ■ ■ ,yK.n\x n -l = 



= £ a «-l(0 -7n(i',0- 
Z'=0 

and 

= £ P(x n+ i = l', y^ +1 ~ N , y n 2 +1 - N , y n K +1 ~ N \x n = I) 

l'=0 
L-l 

£ { P ( x n+1 = l',yi,n+l,y2,n+l, ' ' ' ,VK,n+l\x n = I) 

P{yr^ N ,yr^ N ,---,y^ N K + i = i')} 



Z'=0 



L-l 



;'=o 



(26) 



By definition the term j n (-, •) can be computed by 

ln(l',l) 

= P(x n = l,yi,n,V2,n, ■■■ , yK,n\x n -l = l') 

=P(x n = l\x n -! = I') ■ Pr(y 1<n ,y2, n , ■ • ■ , yK.n\x n = I) 

=P{x n = l\x n -! = l') ■ H^ =1 Pk(y k .n\Xk(l))- 

(27) 

If the input sequence x is i.i.d. the above is reduced to 

P(x n = l,yi,y 2 , ■ ■ ■ >Vk) 

=P(x n = l, yx tn , y 2 , n , ■■■ , VK,n) 
=P{x n = I) ■ P(yi, n ,V2,n, ■ • • i yK.n\x n = I) 

=P(x n = I) ■ nf =1 P fe (y^„|A fe (0), 1 < n < N. 



(28) 
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Fig. 4. The index assignment for two two-description scalar quantizer as 
proposed by [15]. 



This is also the scheme for hard-decision MDQ MMSE 
decoding in midst of inversion and erasure errors. 

Now we analyze the complexity of the proposed JSC-MD 
MMSE algorithm. For each n, 1 < n < N, we need to 
calculate the value of a n (l), (3 n (l) and j n (l',l). As explained 
in the complexity analysis of JSC-MD MAP algorithm, the 

term Ylk=i l°g^fe(2/fe,™|^fc( a )) m •E3 can be computed in 
0(K) time. Thus, the matrix ^ n (V ,1), (1,1') e C 2 , can be 
computed in 0(L 2 K) time. The value of a n (l) and (3 n (l) can 
be computed in 0(L) time. Therefore, the total complexity is 
0(L 2 KN), which has the same order with the complexity of 
JSC-MD MAP algorithm as derived in Section [Till 

If sequence x is i.i.d. the complexity of JSC-MD MMSE 
decoding is reduced to O(LKN) as exhibited by (1281 . For 
memoryless sources, MMSE sequence estimation is degen- 
erated to MMSE symbol-by-symbol decoding. Even if x is 
not memoryless, in consideration of resource scalability, (f28T > 
can still be used as a less demanding alternative for network 
nodes not having sufficient resources to perform full-fledged 
JSC-MD MMSE decoding. The approximation is good if 
the source memory is weak. To the extreme, the weakest 
network nodes of severe source constraints can always resort 
to a hard-decision MD decoding (e.g., using the MD decoder 
Sl2\ ). which takes only O(KN) time to decode a multiple- 
description coded sequence x of length N. The important 
point is that all three decoders of complexities ranging from 
0(L 2 KN) to O(KN) operate on the same MD code streams 
distributed in the network. The reader can continue to the 
next section for further discussions on the issue of resource 
scalability. 

VI. Simulation Results 

The proposed resource-aware JSC-MD distributed MAP and 
MMSE decoding algorithms are implemented and evaluated 
via simulations. The simulation inputs are first-order, zero- 
mean, unit-variance Gaussian Markov sequences of different 
correlation coefficient p. A fixed-rate two-description scalar 
quantizer (2DSQ) proposed in [15] is used as the encoder in 
our simulations. The 2DSQ is uniform and is specified by the 
index assignment matrix shown in Fig. [4] The central quantizer 
has L = 21 codecells and the two side quantizers each has 
L\ = L-2 = 8 codecells. For each description k, k = 1, 2, the 
codeword index Xk(x) is transmitted in fixed length code of 
three bits. 

The channels are simulated to be error-and-erasure channels 
with identical erasure probability p^ and inversion probability 




Fig. 5. Symbol error rates of JSC-MD distributed MAP decoder and MDQ 
hard-decision decoder with p = 0,0.5,0.9. 



p c varying. We report and discuss below the simulation results 
for different combinations of p c , p^ and p. 

First, we evaluate the performance of the JSC-MD dis- 
tributed MAP decoder. The performance measure is symbol 
error rate (SER), which is the probability that a symbol 
of the input Markov sequence is incorrectly decoded. Since 
the input source is Gaussian Markov, the O(LKN) MAP 
algorithm of Section UTU can be used by the resource-rich 
network nodes to obtain the optimal estimation. However, 
resource-deprived network nodes can also decode whatever 
received description(s) of the same 2DSQ code, using a simple 
energy-conserving O(KN) hard-decision MDQ decoder. The 
simulation results are plotted in Fig. [5] Over all values of p, 
p c and p$, the JSC-MD MAP decoder outperforms the hard- 
decision MDQ decoder. As expected, the performance gap 
between the two decoders increases as the amount of memory 
in the Markov source (p) increases. This is because the hard- 
decision MDQ decoder cannot benefit from the residual source 
redundancy left by the suboptimal primitive 2DSQ encoder. 

In the case of JSC-MD distributed MMSE decoding, we 
evaluate three decoders of different complexities (hence dif- 
ferent resource requirements): the exact 0(L 2 KN) algorithm 
derived in Section [VJ the simplified O(LKN) algorithm 
given in d28l ). and the conventional 0(KN) hard-decision 
MDQ decoder. The performance measure for MMSE decoding 
is naturally the signal-to-noise ratio (SNR). The simulation 
results are plotted in Fig. [6][8] with the correlation coefficient 
being 0, 0.5 and 0.9 respectively. The trade-offs between 
the complexity and performance of a decoder can be clearly 
seen in these figures. Given p, p c , the SNR increases 
as the decoder complexity increases. The JSC-MD MMSE 
decoder achieves the highest SNR, because it utilizes both 
inter- and intra-description correlations. The performance of 
the algorithm given in d28l ) is in the middle, which is O(L) 
faster than the full-fledged JSC-MD MMSE decoder but O(L) 
slower than the hard-decision MDQ decoder. This decoder 
reduces complexity or energy requirement by making use of 
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Fig. 6. SNR performances of different MDQ decoders (p = 0). 
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Fig. 7. SNR performances of different MDQ decoders (p = 0.5). 

the inter-description correlation only. The hard-decision MDQ 
decoder is the simplest and fastest. However, it ignores both 
intra- and inter-description correlations and has the lowest 
SNR. As in the MAP case, the performance gap between the 
first two MMSE decoders increases as the intra-description 
redundancy (p) increases. When p = 0, the first two algorithms 
become the same. 

Under both MAP and MMSE criteria, the performance 
gap between different algorithms increases as the erasure 
error probability increases, indicating that the JSC-MD 
distributed decoder can make a better use of inter-description 
correlation in the event of packet loss. As the erasure error 
probability increases in the network, the proposed JSC-MD 
decoder enjoys up to 8 dB gain over the hard-decision MD 
decoders. 

Finally, we point out that even when source memory is weak 
(see the curves for p = 0), the JSC-MD distributed decoders 
still have an advantage over the hard-decision MDQ decoders 
that cannot handle the bit errors within a received description 



p $ =0.1 p=0.9 P$ =0 - 01 P=0-9 p,=0.001 p=0.9 
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Fig. 8. SNR performances of different MDQ decoders (p = 0.9). 

effectively. 

VII. Conclusions 

We propose a joint source-channel multiple description 
approach to resource-scalable network communications. The 
encoder complexity is kept to the minimum by fixed rate 
multiple description quantization. The resulting MD code 
streams are distributed in the network and can be reconstructed 
to different qualities depending on the resource levels of 
receiver nodes. Algorithms for distributed MAP and MMSE 
sequence estimation are developed, and they exploit intra- and 
inter-description redundancies jointly to correct both bit errors 
and erasure errors. The new algorithms outperform the existing 
hard-decision MDQ decoders by large margins (up to 8dB). If 
the source is Gaussian Markov, the complexity of the JSC-MD 
distributed MAP estimation algorithm is O(LNK), which is 
the same as the classic Viterbi algorithm for single description. 

Operationally, the new MDQ decoding technique unifies the 
treatments of different subsets of descriptions available at a 
decoder, overcoming the difficulty of having a large number of 
side decoders that hinders the design of a good hard-decision 
MDQ decoder. 

Appendix 

Complexity Reduction of JSC-MD Problem 

The complexity of the JSC-MD MAP decoding problem in 
Section|III]can be reduced because it has a strong monotonicity 
property, if the source is Gaussian Markovian and is coded by 
multiple description scalar quantizer (MDSQ). To show this 
we need to convert the recursion formula in Section [HI] into a 
matrix search form [12]. We rewrite ( [Tol l as 

w(n, a) — max|u>(n —1,6) + log P(a\b) 

K (29) 
+ X! lo S p k (Vk,n | A fc (a) ) | . 

k=l 
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Then for each 1 < n < N, we define an L x L matrix A n 
such that 

K 

b)=w(n-l,b) + log P(a\b) + ^ log i\(l/M I *k (<*))• 

k=l 

(30) 

Now one can see that the computation task for JSC-MD MAP 
decoding is to find the row maxima of matrix A n . 

A two-dimensional matrix A = A(a. b) is said to be totally 
monotone with respect to row maxima if the following relation 
holds: 

A(a, b) < A(a, b') => A(a' , b) < A(a', b'), a<a',b< b' . 

(31) 

A sufficient condition for (T3TT > is 

A(a,b')+A(a',b) < A(a,b)+A(a',b'), a<a',b<b' (32) 

which is also known as the Monge condition. If an n x n 
matrix A is totally monotone, then the row maxima of A can 
be found in 0(n) time [19]. 

To apply the linear-time matrix search algorithm to the joint 
source-channel MDSQ decoding problem, we only need to 
show that matrix A n satisfies the total monotonicity. Substi- 
tuting A n in (f3Qb for A in (1321 . we have 

log P(a\b')+ log P(a'\b) < log P(a\b) + log P(a'\b'), 

a < a ,b < b 

which is a sufficient condition for A n to have the total mono- 
tonicity and therefore, for the fast algorithm to be applicable. 
This condition, which depends only on the source statistics not 
the channels, is exactly the same as the one derived in [12]. 
It was shown by [12] that (l33l holds if the source is Gaussian 
Markovian, which includes a large family of signals studied 
in practice and theory. 

Finally, we conclude that the time complexity of MAP 
decoding of MDSQ can be reduced to O(LNK) for Gaussian 
Markov sequences. 
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